fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192 by moralespanitz · Pull Request #4 · atomicmemory/atomicmemory-core

moralespanitz · 2026-04-29T17:56:15Z

Summary

Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2 CR mini-slice runs on dense 10-turn chunks. Bumping the max_tokens budget from 4096 → 8192 prevents the truncation.

Evidence

Server log during iter 7 (first attempt) before this fix:

[extractFacts] JSON parse failed (Unterminated string in JSON at position 14152 (line 339 column 8)); attempting repair
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14172 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14284 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14301 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 14172 ...)
[extractFacts] JSON parse failed (Unterminated string in JSON at position 15084 ...)

Six truncations on one ingest pass. Conv-3 crashed.

After bumping to 8192: zero truncation across iter 7 v3 N=3 full-ingest reruns.

Risk

Marginal cost increase: Anthropic bills only for output tokens actually generated, and only the dense chunks that previously truncated will use more tokens. Most extractions stay well under 4096.

Companion changes (separate PRs)

atomicmemory-benchmarks PR feat(search): EXP-14 — retrieval-side abstention gate #8 lowers harness chunk size from 10 → 5 turn-pairs (defense-in-depth on the input side).

Test plan

Server starts and runs with the change
iter 7 v3 N=3 full-ingest reruns: no JSON parse failures
npx tsc --noEmit clean
npm test
fallow --no-cache

Extraction LLM was truncating JSON output at ~14 KB during BEAM Sprint 2 CR mini-slice runs on dense 10-turn chunks. Server log showed: [extractFacts] JSON parse failed (Unterminated string in JSON at position 14152 ...); attempting repair across 6 chunks of one ingest, causing iter 7 (first attempt) to crash on conv-3. The Anthropic max_tokens budget defaults to 4096 in extraction.ts. Going to 8192 doubles the headroom for JSON output without changing any other behavior. Cost impact is marginal (Anthropic bills only for tokens actually generated; rare for extraction to use the full 8192). Validation: server is running with this change locally; iter 7 v3 N=3 full-ingest reruns succeed without truncation. Companion harness mitigation lowered chunk size from 10 to 5 turn-pairs (in atomicmemory-benchmarks PR #8) to reduce the chance of hitting the limit at all. This server-side bump is defense-in-depth.

moralespanitz requested a review from ethanj as a code owner April 29, 2026 17:56

moralespanitz marked this pull request as draft April 30, 2026 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192#4

fix(extraction): bump EXTRACTION_MAX_TOKENS 4096 → 8192#4
moralespanitz wants to merge 1 commit intomainfrom
fix/extraction-max-tokens

moralespanitz commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moralespanitz commented Apr 29, 2026

Summary

Evidence

Risk

Companion changes (separate PRs)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant